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Abstract 

In the context of best practices consultation with high needs teachers, we examined (a) relations between teachers’ 
appropriate response to student rule violations and rates of rule violations, and (b) rates of student misbehavior among 
teachers who do and do not achieve various benchmarks of integrity and/or growth in skills. Participants were 48 
teachers, 48 target students with or at risk for attention deficit hyperactivity disorder (ADHD; one per teacher), and 
remaining students in each classroom. Teachers received up to eight consultation sessions on classroom management and 
implementation of a daily report card (DRC) with the target student. We observed classwide rule violations, target student 
rule violations, and DRC violations, as well as the percentage of rule violations to which the teacher provided an appropriate 
response. Teachers who responded to a higher percentage of rule violations had fewer classwide rule violations (rs = —.32 
to —.53) and target student rule violations (rs = —.22 to —.51) at baseline, Months | to 2, and Months 3 to 4 of consultation. 
Teachers who reached the minimum benchmark of 51% appropriate response and who demonstrated greater growth in 
appropriate responding witnessed fewer rule violations than teachers who did not achieve these benchmarks. Implications 


for preservice training, professional development, and consultation are discussed. 
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About 10% to 20% of children exhibit disruptive behaviors 
that are difficult for teachers to manage (Fabiano et al., 
2013; Visser et al., 2014). If not addressed, disruptive stu- 
dent behavior detracts from instruction time (Robb et al., 
2011) and contributes to teacher stress and occupational 
attrition (Greene, Beszterczey, Katzenstein, Park, & Goring, 
2002; Ingersoll, 2001). When teachers have difficulty man- 
aging challenging behaviors, they often seek assistance 
from an experienced peer colleague, school psychologist, or 
building-level problem-solving team. Problem-solving con- 
sultation (Frank & Kratochwill, 2014), coupled with obser- 
vation and performance feedback (Solomon, Klein, & 
Politylo, 2012) from a trained school professional, repre- 
sents current best practices for facilitating teachers’ use of 
universal or targeted classroom management strategies that 
address challenging student behavior. 

Despite the large literature on consultation and coaching 
(see Stormont, Reinke, Newcomer, Marchese, & Lewis, 


2015, for review), there are few studies that document the 
magnitude of growth in teacher behavior or the level of 
intervention integrity needed to produce change in student 
behavior. Our knowledge of this relationship is limited for 
two reasons. First, most teacher consultation studies have 
reported impacts on proximal outcomes (1.e., teacher knowl- 
edge, skills, or efficacy), with fewer reporting distal out- 
comes like student behavior or achievement (see Pas, 
Bradshaw, & Cash, 2014, for review). Second, most teacher 
consultation studies that report student outcomes are 
single-case designs (e.g., DiGennaro, Martens, & 
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Kleinmann, 2007; Reinke, Lewis-Palmer, & Merrell, 2008). 
These studies allow us to draw conclusions about the func- 
tional relationship between change in teacher behavior and 
change in student behavior; however, they are limited in 
sample size and scope (e.g., restricted to a few weeks). Both 
of these factors limit the extent to which the findings gener- 
alize to typical classroom conditions. 

Unless the consultation-related change in teacher behav- 
ior is linked to change in student behavior, even the most 
effective consultation may represent a squandering of 
expensive resources. In addition, school mental health pro- 
fessionals (SMHPs) have many competing demands for 
their attention, leaving limited time for consultation to 
teachers. The identification of minimum benchmarks for 
integrity could offer guidelines to determine which teachers 
may benefit most from consultation (e.g., high need teach- 
ers). Furthermore, a foundational tenet of multitiered sys- 
tems of support is that the level of intervention intensity can 
be reduced or intensified based on the student’s response to 
a given level of intervention. If we can develop benchmarks 
for integrity for universal and targeted classroom interven- 
tion, SMHPs could have greater confidence in knowing 
whether a lack of response to an intervention is a function 
of the student or inadequate intervention implementation. 
Finally, benchmarks could offer guidelines for training 
teachers in classroom management practices and policies 
related to teacher evaluation. 

The aims of this study were to (1) examine the relations 
between teachers’ appropriate response to rule violations 
and rates of student rule violations (classwide and by a tar- 
get student) in the context of best practices consultation 
over 4 months and (2) explore rates of student rule viola- 
tions (classwide and by a target student) among teachers 
who do and do not achieve (and maintain) various bench- 
marks of integrity or growth in the skill of responding to 
rule violations. 


Intervention Integrity in Consultation 
Research 


Intervention integrity refers to the consultee’s implementa- 
tion of a given strategy (universal or targeted) in a natural 
setting following consultation (Wilkinson, 2007). This is 
considered not only a proximal outcome of consultation 
but also the process through which distal outcomes (e.g., 
change in student behavior) are achieved. However, little is 
known about the magnitude of growth in teacher behavior 
or the level of integrity needed to produce change in stu- 
dent behavior. Although a functional relationship between 
intervention integrity and student outcomes following con- 
sultation has been documented in some studies, Noell and 
Gansle (2014) in their review of studies examining this 
relationship conclude “that the literature is too diverse and 
fractured at present to yield broad conclusions” (p. 400). 


Thus, additional study of the relationship with specific 
attention to benchmarks for integrity and growth in skills is 
warranted. 


Connections Between Change in 
Teacher Behavior and Change in 
Student Behavior 


Several single-case, multiple-baseline design studies dem- 
onstrate a functional relationship between change in 
teacher behavior (e.g., intervention integrity) and change 
in observed student behavior. Sanetti, Collier-Meek, Long, 
Kim, and Kratochwill (2014) found a connection between 
teachers’ implementation of a behavior support plan and 
change in observed student disruptive behavior and aca- 
demic engagement for two of three cases in the context of 
a consultation program that included implementation 
planning as a strategy to promote integrity. Similarly, 
Reinke et al. (2008) found that consultation that included 
motivational enhancement strategies and visual perfor- 
mance feedback increased teachers’ use of praise and 
simultaneously decreased observed student disruptions for 
three of four student-teacher dyads. Furthermore, in a 
multiple-baseline design study, DiGennaro et al. (2007) 
examined the correlations between teacher integrity and 
target student behavior. They found significant correla- 
tions for three of four cases (rs ranged from —.45 to —.78) 
indicating higher integrity was associated with lower fre- 
quency of problem behavior. Finally, in a case study 
design, we found (Coles et al., 2015) a visual relationship 
between specific strategies used in consultation (skills 
practice, beliefs modification strategy), observed teacher 
integrity (increase in labeled praise, appropriate response 
to student rule violations), and observed student outcomes 
(decrease in off-task rule violations and noncompliance, 
increase in work completion,). 

These studies provide promising evidence that integrity 
is associated with change in student behavior; however, 
the finding may not be representative of the duration of 
strategy implementation often required over the course of 
the school year. Furthermore, these studies did not test 
minimum benchmarks of integrity needed to produce 
change in student behavior. Indeed, in some studies, a 
small proportion of students did not respond to the inter- 
vention, possibly suggesting that the level of intervention 
integrity was not sufficient. 

There are several group design studies examining the 
impact of consultation or coaching on teacher behavior 
(e.g., Becker, Bradshaw, Domitrovich, & Ialongo, 2013; 
Bradshaw, Pas, Goldweber, Rosenberg, & Leaf, 2012; 
Motoca et al., 2014); however, there are very few group 
design consultation studies that examine change in 
observed teacher practices and change in observed student 
behavior (Reinke et al., 2014). Conroy and colleagues’ 
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studies provide the most relevant evidence regarding this 
relationship. Across two studies, Conroy and colleagues 
(Conroy et al., 2015; Conroy, Sutherland, Vo, Carr, & 
Ogston, 2014) examined the impact of coaching with per- 
formance feedback over 14 weeks on teachers’ use of 
effective universal classroom management strategies and 
on concomitant change in student behavior. In both studies, 
they found that, relative to the baseline phase, teachers 
improved meaningfully in the use of most strategies fol- 
lowing the coaching phase. They also observed concomi- 
tant changes in observed student behaviors (increases in 
engagement and decreases in defiance, aggression, and dis- 
ruptive behavior) that were not observed in the control con- 
dition (Conroy et al., 2015). 

There are noteworthy patterns in their data. First, growth in 
teacher behavior was significant (e.g., use of praise and correc- 
tive feedback increased from 1% to 7% of observed intervals), 
yet there remained room for growth in the use of the strategies 
(Conroy et al., 2015). Second, neither these authors nor others 
(e.g., Reinke et al., 2014) tested for minimum benchmarks of 
integrity. Third, Conroy et al. (2015) did not consider variabil- 
ity in the teacher sample to determine whether the magnitude 
of teacher growth was related to the magnitude of improve- 
ments in student behavior. Both benchmarks and growth may 
be important, particularly among the teachers with the lowest 
levels of integrity at baseline (e.g., high need teachers). 


Variability in Teacher Profiles 


There is emerging evidence of variability in teacher’s 
knowledge, beliefs, and skills prior to consultation (Owens, 
Coles et al., 2017; Owens, Holdaway et al., 2017; Reddy, 
Fabiano, Dudek, & Hsu, 2013), receptivity to consultation 
(Owens, Schwartz et al., 2017), and intervention imple- 
mentation (Domitrovich et al., 2015; Owens, Coles et al., 
2017). For example, we (Owens, Holdaway et al., 2017) 
examined teachers’ use of praise and appropriate response 
to rule violations prior to engaging in consultation. 
Teachers’ rates of praise per hour ranged from 11 to 38 and 
the percentage of rule violations to which teachers 
responded appropriately ranged from 27% to 47% for 
classwide rule violations and 11% to 31% for target student 
rule violations (a student with or at risk for attention deficit 
hyperactivity disorder [ADHD]). Theoretically, this vari- 
ability can be leveraged to determine whether there are 
minimum benchmarks of teacher behaviors that produce 
desired levels of student behavior, as well as which teach- 
ers may have the greatest need for consultation. 

Similarly, there is variability in the extent to which teach- 
ers respond to coaching (e.g., Becker, Darney, Domitrovich, 
Keperling, & Ialongo, 2013; Cappella et al., 2012; Owens, 
Coles et al., 2017). We (2017a) evaluated the effectiveness 
of an individually tailored, multicomponent consultation 
package (designed to address barriers to integrity) relative to 


a comparison condition designed to represent best practices 
(problem solving with performance feedback) with elemen- 
tary school teachers. Teachers in both conditions showed 
significant improvements in universal and targeted class- 
room management strategies. However, the group of high 
need teachers (i.e., those with lower baseline levels of 
knowledge, skills, and intervention-supportive beliefs) dem- 
onstrated more improvement in response to the multicompo- 
nent consultation than in response to the comparison 
consultation (Cohen’s d ranged from 0.33—1.12). Similar to 
the findings of Conroy et al., despite demonstrating mean- 
ingful growth in strategy use, there was ample room for 
improvement after consultation. On average, high need 
teachers who received the multicomponent consultation 
improved from responding appropriately to 14% of target 
student rule and 28% of classwide rule violations at baseline, 
to responding to 40% and 68% of rule violations, respec- 
tively, at the end of consultation, with wide variability across 
teachers. 

These findings show that individually tailored consul- 
tation that directly addresses barriers to integrity can result 
in improved teacher outcomes relative to current best 
practices. However, even individually tailored consulta- 
tion results in variable teacher change over time, with 
many teachers showing room for continued growth after 
consultation. Thus, perhaps aiming for 100% integrity (or 
even 80%) is unrealistic, yet also unnecessary. Additional 
research is needed to identify student outcomes associated 
with various benchmarks of integrity and various degrees 
of growth in teacher skills. 


Possible Benchmarks for Integrity 


Three studies provide evidence that a modest benchmark of 
integrity (i.e., between 51% and 66%) may be sufficient to 
achieve change in student behavior. First, using a multiple- 
baseline design, Noell, Gresham, and Gansle (2002) exam- 
ined the impact of three levels of intervention integrity 
(prompts were provided for 100%, 66%, or 33% of prob- 
lems) to remind a student to use a strategy. The pattern of 
results indicated that there was a clear distinction in stu- 
dent outcomes (digits correct) between baseline levels (no 
prompts) and the 100% condition, as well as between base- 
line levels (no prompts) and the 66% condition. 
Furthermore, the 100% condition produced better out- 
comes than the 33% condition. However, the distinction 
between 66% and 100% was negligible and described as 
idiosyncratic. Thus, perhaps a benchmark lower than 100% 
is adequate for changing student behavior. 

Second, in our (2017b) study, we observed the percent- 
age of student rule violations (classwide) to which teach- 
ers provided an appropriate response. Teachers were 
sorted into a variety of groups based on their percentage of 
appropriate responding to student rule violations, and 


Journal of Emotional and Behavioral Disorders 00(0) 


student disruptive behavior was compared across groups. 
When the percentage of rule violations to which the 
teacher responded appropriately was less than 30%, rule 
violations were high (70 or 80 per hr); however, when a 
benchmark of 51% appropriate response to rule violations 
was reached, rule violations dropped to about 35 per hr. 
Furthermore, there was little incremental benefit at higher 
levels of appropriate responding. 

Third, Sanetti et al. (2014) found that their consultation 
procedures improved two of three teachers’ adherence rates 
(1.e., percentage of intervention steps implemented) from 
below 50% at baseline to at least 80%, and this change was 
associated with concomitant improvement in student behav- 
ior. Yet, the third teacher only improved adherence from 
44% to 55% and her student showed minimal change in 
observed outcomes. Together, these data suggest that there 
may be a minimum (e.g., 51% or 55%) needed to produce a 
change in student behavior. 

Several benchmarks could be examined, as the previ- 
ous studies suggest 51%, 55%, and 66% may be possible 
minimum benchmarks (Owens, Holdaway et al., 2017; 
Noell et al., 2002; Sanetti et al., 2014). However, we 
argue that examination should begin at 51% integrity. 
Namely, as long as teachers are applying a strategy “more 
often than not,” they are creating predictability in the 
classroom and following through on their expectations. 
Furthermore, we are not aware of any study that has tested 
a minimum benchmark; thus, examining the lowest pos- 
sible, theoretically defensible minimum seems prudent. 
Clearly, it is also worth examining if there are incremen- 
tal benefits with higher levels. We also hypothesize that 
the minimum benchmark of integrity required to modify 
the behavior of students with severe behavior problems 
may be higher than 51% (as seen in the Noell et al. and 
Sanetti et al. studies). These students are more likely to 
demonstrate unpleasant responses in reaction to teacher 
demands than typical students (Carr, Taylor, & Robinson, 
1991). The principles of negative reinforcement predict 
that these unpleasant responses lead teachers to withdraw 
from using strategies, thereby negatively reinforcing both 
student and teacher behavior. Thus, for these students, 
teachers may need to demonstrate even greater consis- 
tency than what is needed for typical students, to extin- 
guish this negative student response. 


Possible Benchmarks for Growth in 
Teacher Skills 


In addition to a minimum benchmark of integrity, there may 
also be a minimum benchmark for percent growth in teacher 
strategy use for achieving improved student behavior. For 
example, Conroy et al. observed that an average of 6% 
growth in teachers’ use of praise (from 1%—7%) was associ- 
ated with concomitant change in student behavior. However, 


because we are not aware of any other study that has exam- 
ined percent growth in teacher skills in relation to student 
behavior and because percent growth in a given strategy 
may vary by strategy (e.g., rates of praise, response to rule 
violations), we do not make hypotheses about benchmarks 
for growth. 


Current Study 


In our previous study (Owens, Coles et al., 2017), we 
reported the teacher outcomes of a randomized trial com- 
paring an individually tailored multicomponent consulta- 
tion condition designed to address possible barriers to 
implementation (i.e., low knowledge, skills, and/or inter- 
vention-supportive beliefs) with a consultation condition 
designed to mirror best practices. Teachers in both condi- 
tions received an equal dose of consultation (up to eight 
biweekly sessions) focused on universal classroom man- 
agement strategies and implementation of a targeted daily 
report card (DRC; Owens et al., 2012; Vannest, Davis, 
Davis, Mason, & Burke, 2010) with one student demon- 
strating significant disruptive behavior. Because teachers 
in both conditions showed significant improvements in 
classroom management skills, we combined teachers in 
our current analyses. In addition, our previous analyses 
identified a group of teachers with low levels of knowl- 
edge, intervention-supportive beliefs, and/or skills at 
baseline (i.e., high need teachers; see details in “Method” 
section). In the current study, we conducted our analyses 
on this sample of teachers, as they are likely most in need 
of consultation, have the most room for growth, and allow 
us to examine the benchmark of 51% integrity. 

In the current study, we used the above-described data 
set to examine (Aim 1) relations between teachers’ appro- 
priate response to student rule violations and rates of rule 
violations, and to examine (Aim 2) rates of student misbe- 
havior among teachers who do and do not achieve and/or 
maintain 51% integrity, and who demonstrate various levels 
of growth in this strategy. We examine integrity benchmarks 
and growth in strategy use for both classwide and targeted 
strategies. We expected to find that higher percentages of 
appropriate teacher response to rule violations would be 
associated with lower rates of student rule violations. We 
also hypothesized that teachers who reached a minimum 
benchmark (i.e., an average of appropriately responding to 
51% of rule violations) during the first half of consultation 
would observe less disruptive behavior (in the class and the 
target student) than teachers who did not achieve this bench- 
mark. Furthermore, we expected that teachers who main- 
tained this benchmark during the first and second half of the 
consultation would observe the lowest levels of disruptive 
student behavior. Although we examined student outcomes 
at other benchmarks, we did not make specific hypotheses 
about these other benchmarks. Similarly, we examined 
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(n=29) 


Total N=48 


Low Knowledge or Beliefs Low % Response to Target Students 


Low % Response to Other Students 
(n=29) 


(n=29) 


Figure |. Venn diagram representing the distribution of sample with knowledge, belief, and skills barriers at baseline. 


various rates of growth in skills, but given the lack of litera- 
ture, we did not make specific hypotheses. 


Method 


Participants 


Participants were 48 general education teachers (K—S grade; 
21 from Ohio, 27 from Florida) from our previous trial 
(Owens, Coles, et al., 2017) who presented with either (a) 
low baseline levels of knowledge and intervention-support- 
ive beliefs (defined below), or (b) low classroom manage- 
ment skills at baseline with target student, or (c) low 
classroom management skills at baseline with whole class 
(defined below; see Figure 1 for distribution of sample 
across these criteria). These teachers had an average of 14.33 
years (SD = 8.52) of teaching experience. Most (63%) had 
obtained a master’s degree. Teachers identified as non-His- 
panic White (45.8%) and Hispanic (any race; 50%), women 
(93.8%). The five Ohio schools had an average of 377 stu- 
dents and 16 general education teachers per school, with 
12% to 29% of students receiving special education services 


and 35% to 75% receiving free or reduced lunch services. 
The three Florida schools had an average of 1,024 students 
and 50 general education teachers, with 4% to 11% receiv- 
ing special education services and 76% to 95% receiving 
free or reduced lunch services. Class size across sites ranged 
from 19 to 25 and the teacher was the sole educator in the 
room. Consultants were postdoctoral fellows (n = 2), mas- 
ter’s level clinicians (n = 2), or graduate students in psychol- 
ogy (n = 5). Six identified as Caucasian, one identified as 
African American, and two identified as Hispanic. 

Target students were 48 elementary school students 
(77.1% male) referred because they demonstrated inatten- 
tive or disruptive behavior. Students identified as Hispanic, 
any race (60%), or non-Hispanic White (39%) and non-His- 
panic Black (1%). Most (91.6%) met criteria for ADHD 
(66.7% combined presentation; 20.8% inattentive presenta- 
tion; 4.2% hyperactive/impulsive presentation) and 8.3% 
were at risk for ADHD (elevated symptoms plus impair- 
ment). The sample had an average IQ estimate of 98.57 (SD 
= 13.01), as assessed by the Wechsler Abbreviated Scales of 
Intelligence, Second Edition (WASI-II; Wechsler, 2011). 
The socioeconomic status of their families was low to 
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middle class (15.9% had a household income below $15,000, 
54.2% had an income between $15,000 and $49,999; 22.9% 
were above $50,000; 8.3% did not report income). Per par- 
ent report, 10.4% of the students had been diagnosed with a 
learning disability, 25% had a prescription for a psychiatric 
medication, and 18.8% had repeated a grade. 

As expected, teachers in the current sample differed from 
teachers excluded from the current sample on baseline levels 
of skills, knowledge, and beliefs (all ps < .05). Teachers in 
the excluded sample, on average, completed about one more 
consultation session (M = 7.30, SD = 1.25) than teachers in 
the included sample (M = 5.98, SD = 2.37) and were more 
likely to be kindergarten or second grade teachers. Teachers 
in the two samples did not differ on number of years in the 
profession, gender, or ethnicity, and their target students did 
not differ on gender, ethnicity, severity of ADHD symptoms 
or impairment, or on IQ estimates (all ps > .05). 


Measures 


Measures to identify high needs teachers. To test our 
hypotheses, it was important to have a sample of teachers 
in need of consultation and with ample room for growth. 
Thus, we only included teachers from our previous study 
who were in the latent class comprised of teachers with 
low baseline levels of knowledge and intervention-sup- 
portive beliefs (see Owens, Coles et al., 2017, for detail), 
and teacher who demonstrated a low percentage of appro- 
priate response to student rule violations (i.e., below the 
sample median), as evidenced by our baseline observa- 
tions. The measures used to assess knowledge, beliefs, 
and observed teacher are described below. 


Tests of teacher knowledge and beliefs. We assessed teach- 
ers’ knowledge of ADHD (prevalence, etiology, treatment) 
using a 24-item true/false/don’t know test, inspired by Jones 
and Chronis-Tuscano (2008). We assessed teachers’ knowl- 
edge of behavioral principles using a 16-item multiple-choice 
test, inspired by the Behavior Modification Test (Kratochwill, 
Elliott, & Busse, 1995). For both measures, total percent cor- 
rect was calculated. Both measures demonstrate sensitivity to 
change as a function of participating in a workshop focused 
on ADHD and classroom management (Owens, Coles, & 
Evans, 2014). We assessed teacher beliefs using the 25-item 
Teacher Locus of Control measure, which assesses teach- 
ers’ perceptions of personal control and responsibility for 
student academic and behavioral outcomes. The measure is 
reliable (Kuder—Richardson formula 20 [KR20] reliability 
scores were .81 for failure and .71 for success in a previous 
study), and scores are predictive of teachers’ use of tech- 
niques learned during an in-service training (Rose & Med- 
way, 1981). We viewed higher scores (i.e., internal locus of 
control) to be associated with intervention-supportive beliefs. 
The KR20 scores with this sample were .71 for failure and .46 


for success. Given the low score for success (and that remov- 
ing items did not improve the reliability scores), this subscale 
was not used. In our previous study (Owens, Coles et al., 
2017), the above scores were subjected to a latent class analy- 
ses which resulted in two classes; teachers in the high need 
class were less knowledgeable and took less credit for their 
students’ failure than teachers in the low need class (effect 
sizes [d] between classes for the above measures ranged from 
1.55—2.89; Owens, Coles et al., 2017). 


Observations of student rule violations and teachers’ 
response to rule violations. Student rule violations (class- 
wide and by the target student) and teachers’ response to 
these rule violations were the primary variables of inter- 
est for the study and were obtained via a modified version 
of the Student Behavior-Teacher Response Observation 
Rating System (SBTR; Pelham, Greiner, & Gnagy, 2008). 
Data from the baseline observations (two to four per 
teacher) were used to identify high need teachers; namely, 
any teacher who fell below the sample median on percent 
appropriate response to classwide or target student rule 
violations was included (see Figure 1). 

The SBTR has adequate interrater reliability, convergent 
validity, and sensitivity to change when used in elementary 
classrooms (Fabiano et al., 2010; Owens, Holdaway et al., 
2017). Using this system, observers obtained frequency 
counts of (a) classroom rule violations (RVs) by the target 
student, (b) violations of DRC target behaviors (DRCRVs) 
by the target student, (c) classroom RVs by all other stu- 
dents, and (d) teacher’s appropriate responses to each RV 
(ARRV). The observation manual includes definitions for 
the violation of seven common classroom rules (i.e., be 
respectful, obey adults, work quietly, remain in seat, raise 
hand to speak, use materials appropriately, stay on task), 
and for coding how the teacher responded to each rule vio- 
lation (i.e., appropriately, inappropriately, or no response). 
In the manual, an appropriate response is defined as any 
verbal or nonverbal action that follows a rule violation to 
provide a response to the behavior. Appropriate responses 
contain appropriate content and are delivered with appro- 
priate affect, with a neutral tone of voice of normal pitch 
and intensity, and without including any behavior included 
in the Inappropriate Response definition (i.e., verbal or non- 
verbal behavior that is antagonistic, accompanied by exces- 
sive or inappropriate gestures, or delivered with 
inappropriate affect or an inappropriate tone of voice.). 
Prior to conducting observations, teachers were informed of 
the rules to be coded. Although we acknowledge there are 
differences in rules posted and enforced in elementary 
classrooms, all seven rules were coded for all teachers to 
maintain consistency in the data across teachers. 

Consultants and research assistants (unaware of teacher 
condition) were trained to reliability on the SBTR (see 
details in Owens, Coles et al., 2017). Interobserver 
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assessments were conducted for 24% of all observations in 
the trial. Across all frequency variables, the intraclass cor- 
relations (ICC) of Type | for average of k raters (ICC(1,k)) 
as outlined in Shrout and Fleiss (1979) ranged from .78 to 
.98 with an average of .90. Observation durations ranged 
from 15 to 45 min. To standardize the variables across 
observations, frequency counts of target student classroom 
RVs, target student DRCRVs, and other student RVs were 
transformed into rates per hour, and teacher responses are 
presented as percentage of ARRVs for the target student 
RVs, DRCRVs, and other student RVs. To maximize our 
ability to present overall trends, we also created averages of 
the above variables to represent the baseline period, obser- 
vations that occurred during the first half of consultation 
(Months 1-2), and observations that occurred during the 
second half of consultation (Months 3-4). 


Study Procedures 


See Owens, Coles et al., (2017) for a complete description 
of procedures. All general education teachers in each ele- 
mentary school were invited to a 3-hr workshop conducted 
by the investigators that focused on best practices in general 
classroom management strategies and the DRC. At this 
time, teachers completed a battery of questionnaires, includ- 
ing those used to determine their status in the current study. 

The larger study from which the data were drawn was 
focused on students with or at risk for ADHD. Thus, teach- 
ers interested in participating in consultation were required 
to identify one student with or at risk for ADHD; consent 
was required by teacher and parent, and assent was required 
by the student. Inclusion criteria for being a target student 
were the following: (a) enrolled in a general education class- 
room (K-—5) for at least 50% of the day, (b) had an IQ esti- 
mate that fell in or above the 90% confidence interval for a 
score of 80, and (c) met diagnostic criteria for Diagnostic 
and Statistical Manual of Mental Disorders (4th ed.; 
DSM-IV; American Psychiatric Association, 1994) ADHD 
or were at-risk for ADHD. ADHD was defined as the pres- 
ence of six or more symptoms of inattention and/or hyperac- 
tivity/impulsivity as reported by parents on the Children’s 
Interview for Psychiatric Syndromes—Parent Version 
(P-ChIPS; Fristad, Teare, Weller, Weller, & Salmon, 1998) 
or the parent- or teacher-version of the Disruptive Behavior 
Disorders Rating Scale (Pelham, Gnagy, Greenslade, & 
Milich, 1992), and impairment in the school setting as 
defined by a rating of at least 3 on the /mpairment Rating 
Scale (Fabiano et al., 2006). Information obtained from the 
P-ChIPS helped to rule out other disorders as sources of 
ADHD symptoms and to assess the chronicity of symptoms. 
At-risk status was defined as four or more symptoms and 
impairment in school. Children were excluded if they had a 
previous diagnosis of an autism spectrum disorder, bipolar 
disorder, or intellectual disability per parent report. 


Once a target student was identified and teacher consent 
obtained, at least two baseline classroom observations using 
the SBTR were conducted, and weekly observations were 
scheduled for the duration of participation in consultation. 
Teachers were paid for attending the in-service training and 
completing questionnaires, but did not receive compensa- 
tion for participating in consultation sessions or for imple- 
mentation of any classroom management practices. 


Consultation Procedures 


In the previous trial, we used stratified random sorting to 
assign teachers to one of two consultation conditions so that 
teachers in each condition did not differ in baseline compe- 
tence ratings (Owens, Coles et al., 2017). Furthermore, teach- 
ers in each condition received an equal number of consultation 
sessions and observations (Owens, Coles et al., 2017). Thus, 
for the current study, teachers were combined across condi- 
tions as the distinction between the conditions was not rele- 
vant to the research questions for this study. Consultation in 
both conditions focused on general classroom management 
strategies (1.e., labeled praise, use of rules, effective instruc- 
tions, and appropriate response to RVs) and the use of a DRC 
intervention. In both conditions, teachers participated in 
meetings focused on the creation of a DRC. Once the DRC 
was launched, teachers met with consultants every other 
week to receive performance feedback from observations, 
discuss high quality implementation, and problem solve chal- 
lenges that arose. Sessions ranged from 30 min to | hr, and 
were conducted during, before, or after school. 


Results 


Missing Data 


It is important to note that the sample sizes change across 
analyses because only teachers with low baseline scores for 
that given variable are included in a given analyses. For 
example, the correlation between teachers’ ARRV of target 
students and target students’ RVs only includes those teach- 
ers who had low baseline scores in either knowledge and 
beliefs or ARRV for target student RVs (i.e., this correlation 
does not include the four teachers who were low baseline 
for other student RVs; see Figure 1). In addition, the sample 
size declines between baseline and Months 3 to 4 for a vari- 
ety of reasons: Three teachers withdrew before Month 3, 
seven students moved out of the teacher’s classroom before 
Month 3, four teachers referred students to the program late 
in the year so they did not start biweekly sessions until late 
winter and thus did not receive 3 months of consultation, 
and in several instances (e.g., applies to four teachers for the 
target student variables), there were 0 rule violations 
observed in the Months 3 to 4 time period; thus, the variable 
of teacher ARRV is not possible and is missing. Because the 
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Table |. Correlations Between Rates of Student RVs and Teacher Percent ARRV for Target Student and Other Students by Time. 


Teacher variables 


Baseline Months |-2 Months 3-4 

Student variables %ARRV %ARRV %ARRV 
Target student RVs* 

Baseline RVs -.22 

Months |-2 -2I -27' 

Months 3-4 -.27 -.26 —51** 
Target student DRC violations? 

Months |-2 NA -.09 

Months 3-4 NA -.22 -.21 
Other student RVs‘ 

Baseline RVs —.34* 

Months |-2 -.29 -.32* 

Months 3-4 i WA -.29 —53** 


Note. RV = rule violation; ARRV = appropriate response to rule violations; DRC = daily report card. 
“For target student data, n = 44 at baseline, n = 42 at Months I-2, and n = 25 at Months 3-4. 


>For DRC data, n = 36 at Months |—2 and n = 21 at Months 3-4. 


“For Other Student data, n = 40 at baseline and Months |—2 and n = 27 at Months 3-4. 


ty < 09. *p < .05. *p < 01. 


reasons for missing data represent typical school practice 
(i.e., students moving, natural course of teacher referrals) 
and are not related to the research process, and because the 
patterns detected at Months | to 2 are generally similar to 
those found at Months 3 to 4 (see below), we view the data 
at all time points to be valid. 


Aim |: Association Between Teacher Behavior 
and Student Behavior 


We examined correlations between teacher response to rule 
violations and rates of rule violations using data averaged 
during the baseline period, Months 1| to 2, and Months 3 to 
4 (see Table 1). Within any given time point, these variables 
are negatively related. For other student RV, the rs range 
from —.32 to —.53 (all ps < .05), with the strongest relation 
emerging at Months 3 to 4. For target student RVs, the rs 
range from —.22 to —.51, with the strongest relation emerg- 
ing at Months 3 to 4. For DRCRVs, rs range from —.09 to 
—.21 (all nonsignificant). 


Aim 2: Student Outcomes Relative to 
Achievement of Benchmarks and Teacher 
Growth 


Achieving the benchmark: Months | to 2. First, we identi- 
fied the portion of the sample who had achieved an aver- 
age of 51% ARRV to target student (nm = 8; 16.6%) and 
other student RVs (n = 23; 47.9%) during Months | to 2. 
On average, those meeting the benchmark for target 


students responded appropriately to 78% (SD = 21%) of 
target student RVs during Months | to 2, whereas those 
not meeting the benchmark responded to 24% (SD = 17%) 
of target student RVs during Months | to 2, (40) = 7.65, p 
< .001. Similarly, on average, those teachers meeting the 
benchmark for other students responded appropriately to 
74% (SD = 16%) of other student RVs during Months | to 
2, whereas those not meeting the benchmark responded to 
28% (SD = 11%) of other student RVs during Months | to 
2, (34) = 10.08, p < .001. 

Independent-samples ¢ tests were conducted on the 
average rate of target student RVs, DRC violations, and 
other student RVs between teachers who did and did not 
achieve the 51% benchmark during Months | to 2 (see the 
top half of Table 2). Teachers who met the benchmark 
experienced about half of the RVs than teachers who did 
not meet this benchmark. Hedges’s g effect sizes are mod- 
erate to large for these comparisons (significant p values 
ranged from .07 to .001). 


Maintaining the benchmark. We also grouped teachers based 
on whether they maintained the 51% benchmark across 
both Months | to 2 and Months 3 to 4 time periods (i.e., 
Maintained Group) or did not (failed to achieve the bench- 
mark at either time period). Only three teachers (6.25%) 
maintained the benchmark toward target student RVs at 
both time points and 12 teachers (25%) maintained the 
benchmark toward other student RVs at both time points. 
On average, those maintaining the benchmark toward target 
students responded appropriately to 89% (SD = 18%) of tar- 
get student RVs during Months 3 to 4, whereas those not 
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Table 2. Rates of RVs for Target Students and Other Student by Time and Teacher Benchmark Groups. 


Teacher variables 


Teachers below 51% 


Student variables benchmark over Months |—2 


Teachers meeting 51% 
benchmark over Months |—2 


Between group 
effect size (g) 


Target student RVs 


Months |-2 6.50 (5.74) 

Months 3-4 4.81 (6.39) 
Target student DRCRVs 

Months |-2 2.82 (3.69) 

Months 3—4 2.41 (5.21) 
Other Student RVs 

Months |-2 47.86 (19.52) 

Months 3-4 42.68 (33.72) 


3.87 (3.04) Al 
2.06 (1.44) 46 
0.43 (0.65)** 7I 
0.40 (0.57) 42 

35.93 (20.36) t .60 

21.84 (15.32)* 80 


Teacher variables 


Teachers not maintaining a 51% 


Student variables benchmark over Months |—4 


Target student RVs 


Months 3-4 5.32 (6.28) 
Target student DRCRVs 

Months 3-4 2.54 (5.29) 
Other student RVs 

Months 3-4 44.57 (31.30) 


Teachers maintaining 51% 
benchmark over Months |—4 


Between group 
effect size (g) 


LLL (1.11) 70 
0.40 (0.70) 42 
19.55 (13.19) 1.00 


Note. The 51% benchmark represents an average of 51% appropriate response for the given type of RV (i.e., target student RV or other student RY). 


RV = rule violation. 
tb < 09. *p < .05. **p < 01. 


maintaining the benchmark responded to 42% (SD = 34%) 
of target student RVs during Months 3 to 4, (23) = 2.32, 
p=.03. Similarly, on average, those maintaining the bench- 
mark toward other students responded appropriately to 84% 
(SD = 18%) of other student RVs during Months 3 to 4, 
whereas those not maintaining the benchmark responded to 
47% (SD = 25%) of other student RVs during Months 3 to 
4, (21) = 3.93, p=.001. 

Independent-samples ¢ tests were conducted on the aver- 
age rate of target student, DRC violations, and other student 
tule violations among teachers who did and did not achieve 
each maintenance benchmark (see bottom half of Table 2). 
Teachers who maintained the benchmark experienced about 
half of the RVs than teacher who did not maintain this 
benchmark. Hedges’s g effect sizes are moderate to large 
for these comparisons (the p value for the significant effect, 
other student rule violations, was .011). 


Student outcomes as a function of teacher integrity and growth. In 
an attempt to replicate the pattern of data provided in Owens, 
Holdaway et al., (2017), teachers were sorted based on their 
percent ARRV to target student (see Figure 2a) and other stu- 
dent RVs (see Figure 2b), and rates of student RVs were 
examined across the groups. Because there was minimal 
variability in DRCRVs, we did not assess this outcome 


further. A few patterns are noteworthy. First, the pattern 
across both figures indicates that student rule violations are 
highest when teacher percent ARRV is lowest. Second, the 
patterns suggest that earlier in the year (i.e., Months 1-2), 
there may be slight incremental benefit with higher levels of 
percent ARRV beyond the 51% benchmark for both types of 
RVs. For example, at Months | to 2, target students of teach- 
ers with a 51% ARRV violated an average of four rules per 
hour, whereas target students of teachers with 90% or higher 
ARRY violated less than one rule per hour (see Figure 2a). 
Similarly, at Months 1 to 2, the rate of other student RVs per 
hour among teachers with a 51% ARRV was an average of 
37, and the rate per hour among teachers with 90% or higher 
ARRYV violated less than 23 (see Figure 2b). However, at 
Months 3 to 4, this incremental pattern was not observed. 
Finally, to try to better understand this pattern, average 
rates of other student RVs were examined considering both 
various levels of growth and whether or not teachers achieved 
the 51% minimum integrity benchmark. The average rates of 
other student RVs for each of these groups are depicted in 
Figure 3. As revealed in Table 2, achieving the benchmark of 
51% is related to lower rates of rule violations. However, the 
pattern depicted in Figure 3 reveals that growth within each 
benchmark group is also related to rule violations. For exam- 
ple, the average rates of rule violations among teachers who 
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Figure 2. Rate of (a) target student rule violations/hour and (b) other student rule violations/hour as a function of teacher 
appropriate response (AR) group during Months | to 2 and Months 3 to 4. 


Note. AR = appropriate response. 
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Figure 3. Rates of other student rule violations at Months 3 to 4 based on teacher’s growth and attainment of 51% benchmark of 


appropriate response to rule violations. 


improved less than 20% and never crossed the 51% bench- 
mark (n = 7) were 55.75, whereas the average rate among 
teachers who improved 20% to 50% and never cross the 51% 
benchmark (n = 5) was 33.77. Furthermore, among teachers 
who achieved the 51% benchmark and improved <20% 


(n = 4) had on average 31.04 rule violations per hour. Those 
who achieved the 51% benchmark and improved >50% (n = 
4) had 21.92 rule violations per hour. Thus, reductions in rule 
violations appear related to both growth in ARRV and 
exceeding the minimum benchmark of 51% ARRV. 
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Discussion 


This study examined the association between teacher 
response to rule violations and student behavior in the 
context of up to 4 months of teacher consultation. This 
relationship was analyzed in a sample of teachers identi- 
fied as needing consultation supports, and as a function of 
integrity benchmarks and teacher growth in skills. The 
results advance the science of consultation research by 
documenting levels of student outcomes at various levels 
of teacher integrity and growth in skills and by revealing 
a possible minimum benchmark for integrity in teacher 
practices needed to produce desired change in student 
behavior. Such data stimulate a host of testable hypothe- 
ses that will guide our understanding of how change in 
teacher practices mediates the relationship between con- 
sultation and student outcomes. 

First, our study reveals the relationship between teacher 
behavior and student behavior prior to consultation and at 
two time points within the consultation process (over 
Months 1—2 and Months 3-4). As hypothesized, there is a 
modest negative relationship between teacher behavior and 
student behavior at baseline (rs = —.22 to —.34); however, as 
teachers enhance their use of effective practices, this rela- 
tionship becomes stronger (rs > .50 at Months 3-4; see 
Table 1). These correlations are consistent with previous 
findings (DiGennaro et al., 2007) and suggest that student 
learning over time how teachers respond to misbehavior 
and modify their behavior accordingly. However, this pat- 
tern was not observed for target student DRCRVs. This may 
have occurred because behaviors that are targeted on the 
DRC likely represent the behaviors that are most impairing 
to the student and most difficult to change (otherwise a tar- 
geted intervention would likely not have been used). Given 
that there are many other influences on these behaviors 
beyond that of the teacher (e.g., home environment, neuro- 
cognitive development), this specific strategy (i.e., response 
to rule violation) may have a proportionally smaller impact 
on these behaviors relative to other target student behaviors 
and/or to the behavior of other students. Alternatively, this 
pattern may have been found because DRCRVs were 
observed at the lowest rate of the three student variables 
and, as a result, had a more restricted range. 

Second, based on patterns detected in previous studies, 
we hypothesized that achieving a minimal benchmark of 
51% integrity could have a meaningful impact on student 
outcomes. Indeed, the data in Table 2 show that teachers 
who achieved the benchmark by Months | to 2 and/or 
who maintained the benchmark across both time periods 
witnessed about half the rule violations (among target 
students and other students) as compared with teachers 
who did not achieve or maintain this benchmark (effect 
sizes were medium to large). Given that behavioral infrac- 
tions detract from classroom instruction time (Robb et al., 


2011), are stressful for teachers (Greene et al., 2002), and 
are a top contributor to teacher job dissatisfaction and 
attrition from the profession (Ingersoll, 2001), reducing 
violations by half would likely have a substantial impact 
on learning time for all students, student-teacher rela- 
tions, and teacher job satisfaction. Furthermore, this find- 
ing offers provocative new ideas for future research on 
teacher training, consultation, and assessing a student’s 
response to intervention (see “Conclusions and 
Implications” section below). 

Third, the patterns in Figure 2a and 2b document the wide 
variability in teacher skills throughout consultation. This 
variability suggests that the field needs to shift from a one- 
size-fits-all approach to an individually tailored approach to 
professional development and consultation (Gage, MacSuga- 
Gage, & Crews, 2017; Owens, Coles et al., 2017), as teach- 
ers with different profiles may need different types of 
support to achieve adequate growth and benchmarks. The 
patterns offer support to the hypothesis that a benchmark of 
51% integrity may be a meaningful and reasonable minimum 
benchmark to achieve; however, the data also suggest that 
there are benefits of incremental integrity. Rule violations in 
target students and other students are observed at lower rates 
when teachers achieve 80% or 90% ARRV than when teach- 
ers achieve 40% or 50% ARRYV, particularly in the earlier 
phases of consultation. In addition, the pattern in Figure 3 
documents the impact that growth (in the context of bench- 
marks) may have on student outcomes. Collectively, these 
findings highlight the need to further develop tools for 
assessing teacher’s willingness to engage in consultation 
(Owens, Schwartz et al., 2017), as well as methods for tailor- 
ing consultation to the variety of teacher needs and response 
patterns (e.g., Gage et al., 2017; Owens, Coles et al., 2017; 
Reinke et al., 2008; Sanetti et al., 2014). 

Finally, it is notable that nearly 60% of teachers achieved 
the 51% benchmark for other student ARRV during Months 
1 to 2, yet less than 20% met the 51% benchmark for target 
student ARRV during Months 1 to 2. This supports the 
notion that it is more challenging for teachers to change 
their behavior in relation to target students than other stu- 
dents and that greater attention may need to be devoted to 
teacher behaviors directed toward target students. 


Limitations 


First, the small sample size prevented further statistical 
analyses and the examination of multiple benchmarks for 
growth and integrity. Second, we only analyzed one teacher 
behavior. Clearly, there are multiple important teacher prac- 
tices that affect student behavior and should be assessed 
before results can fully inform teacher evaluations systems. 
Third, observers recorded a standardized set of rule viola- 
tions common to elementary classrooms that may not have 
corresponded to the rules posted and enforced in each 
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individual classroom. Although this decision increases the 
precision of observations, it may affect generalizability of 
results across classrooms. 


Conclusions and Implications 


Because DRC target behaviors represent priority behav- 
iors that are critical to improving a student’s functioning, 
yet were only minimally related to the teacher practice 
studied here, additional work is needed to understand 
how to help teacher behaviors differentially impact these 
impairing behaviors. For example, DRC target behaviors 
may be more difficult to change than behaviors exhibited 
by students who are not at-risk for ADHD and may require 
a high degree of coordinated implementation (i.e., high 
consistency of teacher implementation and high degree of 
home-based implementation) to shift student behavior. In 
addition, that these behaviors were observed at the lowest 
rate of the three student variables highlights the chal- 
lenges associated with capturing, via observation, student 
behaviors associated with targeted interventions and the 
need for the development of adequate systems for study- 
ing such behaviors. 

The findings also stimulate new ideas related to teacher 
training, teacher evaluation systems, and assessing a stu- 
dent’s response to intervention. For example, with addi- 
tional research in teacher training programs and/or 
evaluation systems, we may be able to identify bench- 
marks for minimal competencies and for mastery of a 
given skill, as well as goals and processes for individual- 
ized professional development plans. Investigation is also 
needed to better understand how benchmarks might vary 
across different teacher skills (e.g., response to violations 
vs. praise) and based on intensity of student needs. 
Furthermore, as mentioned previously, SMHPs could 
begin to assess and use benchmarks to determine where 
limited consultation resources should be directed. Such 
benchmarks could be systematically considered prior to 
intensifying an intervention for a student who seems to be 
insufficiently responsive. Namely, school teams could 
systematically assess both benchmarks and_ student 
response before making intervention decisions. Finally, 
these data suggest that further study of benchmarks of 
integrity defined in various increments is warranted with 
larger samples to enhance confidence in the findings and 
further demark minimum benchmarks and meaningful 
increments to achieve. 

This is the one of the first studies to examine possible 
benchmarks for integrity and/or growth in skills among 
high needs teacher receiving best practices consultation. 
The findings provide a foundation for a variety of hypoth- 
eses that can be tested related to response to rule viola- 
tions, as well as other critical teacher behaviors. 
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